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engineering technologies when applied to the development of 
applications software. The SEL was created in 1977 and has 
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NASA/GSFC (Systems Development and Analysis Branch) 

The University of Maryland (Computer Sciences Department) 

Computer Sciences Corporation (Flight Systems Operation) 

The goals of the SEL are (1) to understand the software de- 
velopment process in the GSFC environment; (2) to measure 
the effect of various methodologies, tools, and models on 
this process; and (3) to identify and then to apply success- 
ful development practices. The activities, findings, and 
recommendations of the SEL are recorded in the Software En- 
gineering Laboratory Series, a continuing series of reports 
that includes this document. A version of this document was 
also issued as Computer Sciences Corporation document 
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ABSTRACT 


This document reports the evaluations of and recommendations 
for the use of software development measures based on the 
practical and analytical experience of the Software Engi- 
neering Laboratory. It describes the basic concepts of 
measurement and a system of classification for measures. 

The principal classes of measures defined are explicit, 
analytic, and subjective. Some of the major software meas- 
urement schemes appearing in the literature are reviewed. 

The applications of specific measures in a production envi- 
ronment are explained. These applications include predic- 
tion ana planning, review and assessment, and evaluation and 
selection. 
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SECTION 1 - INTRODUCTION 


Effective software development management depends on the ac- 
curate measurement of project attributes. This document 
reviews the state of the art of practical software measure- 
ment, which is still, to some extent, an art rather than a 
science. Substantial research is in progress, however, and 
innovations have been rapid in this vital area. Major im- 
provements in both the collection and interpretation of 
measures are expected in the next few years. However, cer- 
tain lessons can be applied now. 

Many different measures have been proposed in the literature 
(Reference 1). (No distinction is made in this document be- 
tween the meaning of "measure" and that of "metric.") Dur- 
ing the past 6 years, the Software Engineering Laboratory 
(SEL) has made a major effort to understand, verify, and 
apply these measures to the software development process, as 
well as to develop new ones and refine existing ones. This 
document presents some evaluations of and recommendations 
for the application of software development measures and 
metrics, based on the practical and analytical experience of 
the SEL. 

Measures appeal to the software engineering researcher and 
software development manager as potential means of defining, 
explaining, and predicting software quality characteristics, 
especially productivity, reliability, and maintainability. 
The software manager in particular needs to be able to de- 
termine the quality of a software project at every point in 
its life cycle. Questions that measures can answer include 
the following: 

• Is this software project on schedule? 

• How many errors can be expected? 
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• Is this methodology effective? 

• How good is this product? 

The reader will obtain an understanding of the theory of 
software measures and their application to questions such as 
these. This document is intended to serve as a reference 
for the technical manager of software development projects 
who desires to monitor and review ongoing development, pre- 
dict cost and quality, and evaluate alternative development 
techniques. Another document (Reference 2) discusses the 
difficulties and priorities of collecting measures and data 
in general. 

This document presents the general concepts of software 
measurement, reviews the work done to date, and then demon- 
strates the application of these concepts in a production 
environment. Its scope will be expanded as the SEL learns 
more about measures. In particular, a major effort is under 
way to identify measures that can be applied early in the 
development process (i.e., during requirements and design). 
The results of these studies will move us closer to the 
final goal of putting the academics of measures into the 
hands of the software practitioner. 

1.1 DOCUMENT ORGANIZATION 

This document consists of four major sections. Section 1 
introduces some concepts of software measurement and de- 
scribes the source of the analyzed data and the basis of the 
practical experience. References 2 and 3 present more de- 
tailed explanations of this material. 

Section 2 explains a classification scheme for software 
measures. Organizing the available measures in this manner 
facilitates their systematic consideration. Some commonly 
used software measures are explained within the context of 
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the classification scheme. The following classes of 
measures are defined by this scheme: 

• Explicit 

• Analytic 

• Subjective 

Section 3 summarizes the efforts of the SEL to evaluate the 
available software measures. Studies of each of the three 
classes of measures' are described. 

Section 4 demonstrates the application of measures to soft- 
ware development management for each phase of the software 
life cycle. The following applications of measures are 
considered : 

• Prediction (for planning) 

• Review (for assessment) 

• Evaluation (for selection) 

Section 5 reiterates the major conclusions and indicates the 
direction of current SEL research. 

1.2 CONCEPTS OF MEASUREMENT 

Measurement is the process of assigning a number or state to 
represent a physical quantity or quality. The need to meas- 
ure the quantity and quality of developed software is 
self-evident. Measures of productivity, reliability, main- 
tainability, and complexity, for example, are vital to soft- 
ware development planning and management. 

A large number of measures have been proposed by re- 
searchers, not all of which are equally useful in practice. 
This document is an attempt to organize the available meas- 
ures in a rational manner and to identify those that have 
been employed successfully in a production environment. 
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Reference 2 explains a three-dimensional scheme for the 
classification of software measures. These dimensions are 
development component, level of detail, and measurement 
method. The first two are useful in planning and implement- 
ing a program of data collection. The last is essential to 
the interpretation and application of software measures, 
which is the focus of this document. 

The first dimension of classification is the development 
component. The software development activity can be divided 
into discrete components, as shown in Figure 1-1. The com- 
ponents included in this model are the following: 

• Problem — The software problem as described in the 
requirements specification and constraints 

• Environment — Characteristics of the development in- 
stallation and personnel 

• Product — The software and documentation produced by 
the development effort 

• Process- -The procedures, techniques, and method- 
ologies employed in developing the product 



s PROCESS PHASES 




REQUIREMENTS 
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Figure 1-1. Software Development Model 
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Measurements can be classified based on the components 
(i.e., problem, environment, product, process) to which they 
apply. Examples of problem measures include the number and 
complexity of requirements. Programming language and devel- 
opment computer are characteristics of the environment. 
Product measures include lines of code and pages of documen- 
tation. Team size and methodology use are examples of meas- 
ures that characterize the process. 

The software development process is the component most 
easily manipulated by managers and must be carefully moni- 
tored. Simply measuring a software development project at 
its conclusion is inadequate for most purposes. Measure- 
ments must be made throughout the life of a software proj- 
ect. Figure 1-1 shows the decomposition of the process into 
seven life cycle phases. Although this is the basic life 
cycle definition used by the SEL, a simpler sequence con- 
sisting of design, implementation, and testing is also used 
in this document as a heuristic device. Table 1-1 compares 
the two life cycle definitions. 

Table 1-1. Software Life Cycle Definitions 
(Based on SEL Experience) 


Detailed Life Cycle Simplified Life Cycle 


Phase 

Percent of 
Schedule 

Phase 

Percent of 
Schedule 

Requirements Analysis 

5 

) 


Preliminary Design 

10 

/ Design 

30 

Detailed Design 

15 

) 


Implementation 

40 

- Implementation 40 

System Testing 

20 

| Testing 

30 

Acceptance Testing 

10 
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Another dimension of classification is level of detail (or 
resolution) . Measurements can be performed at several 
levels of detail: 

• Pro ject - -High -level summary 

• Component — Discrete project parts, such as subsys- 
tems, modules, COMMON blocks, etc. 

• Event — Occasional or periodic occurrences, such as 
changes, computer runs, etc. 

The level of detail of measures collected depends on the 
manager's perspective and cost constraints. As discussed in 
Section 4, these measures can provide useful feedback to 
managers and developers. (Reference 2 presents a more de- 
tailed discussion of cost considerations.) 

The final and most important dimension (from the perspective 
of this document) is method of measurement. Measurements 
can be obtained by several different methods: 

• Explicit — Simple numeric counts, averages, and 
other directly obtained indicators (e.g., lines of 
code and errors per module) 

• Analytic- -Complex measures based on assumptions 
about the relationships among software features 
(e.g., Halstead length and cyclomatic complexity) 

• Subjective - -Ratings of quality and use arrived at 
by high-level review and comparison (e.g., trace- 
ability and completeness) 

The specific measurement method employed implies certain 
beliefs about the nature of the software development proc- 
ess. This dimension of classification (method of measure- 
ment) is the basis for the organization of Sections 2 and 3 
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of this document. Section 2 reviews some of the major meas- 
urement proposals of each type. Section 3 summarizes the 
results of SEL research in each of these areas. 

1.3 SOFTWARE ENGINEERING LABORATORY 

The SEL is a cooperative effort of Goddard Space Flight Cen- 
ter (GSFC) , Computer Sciences Corporation (CSC) , and the 
University of Maryland (UM) . The SEL collects and analyzes 
data from software development projects that support flight 
dynamics activities at GSFC. More than 40 projects have 
been monitored by the SEL during its 7-year life. SEL prin- 
cipals also participate in the management of these proj- 
ects. The recommendations presented in this document are 
based on this analytical and practical experience. Refer- 
ence 3 describes the SEL and its activities in more detail. 

The general class of spacecraft flight dynamics software 
studied by the SEL includes applications to support attitude 
determination, attitude control, maneuver planning, orbit 
adjustment, and mission analysis. The attitude systems, in 
particular, form a large and homogeneous group of software 
that has been studied extensively. Table 1-2 summarizes the 
major characteristics of the software developed in this 
environment . 

Measures have been collected and analyzed regularly from 
these projects. The bibliography included in this document 
contains numerous reports of the results of these analyses. 
Reference 4 describes a recent study incorporating more than 
600 measures of special interest to software development 
managers . 
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Table 1-2. Software Development Environment 


Type of Software: Scientific, ground-based, interactive 

graphic with moderate reliability and response requirements 

Languages: 85 percent FORTRAN, 15 percent assembler macros 


Machines: IBM S/360 and 4341, 

Process 

Characteristics 

Duration (months) 

Effort ( staf f-years) 

Size (1000 LOC) 

Developed 3 

Delivered* 3 

Staff (full-time equivalent) 
Average 
Peak 

Individuals 

Application Experience (years) 
Managers 
Technical Staff 

Overall Experience (years) 
Managers 
Technical Staff 


batch with TSO 


Average 

High 

Low 

15.6 

20.5 

12.9 

8.0 

11.5 

2. 4 

57.0 

111.3 

21.5 

62.0 

112.0 

32.8 

5.4 

6.0 

1.9 

10.0 

13.9 

3.8 

14 

17 

7 

5.8 

6. 5 

5.0 

4.0 

5.0 

2.9 

10.0 

14.0 

8.4 

8.5 

11.0 

7.0 


a New lines of code plus 20 percent of reused lines of code. 
^Total lines of code. 
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SECTION 2 - SURVEY OF MEASUREMENT APPROACHES 


A classification of the available software development meas- 
ures is a prerequisite for any systematic evaluation of 
them. Many of the measures that have been proposed are 
similar to each other. A classification scheme provides a 
mechanism for avoiding unnecessary duplication while ensur- 
ing full coverage of all important software development 
characteristics. The following classes of measures will be 
discussed here: 

• Explicit 

• Analytic 

• Subjective 

The following sections define these classes, show their 
logical relationship to each other, and outline some of the 
major measurement proposals. The reader can consult the 
references for more detailed explanations. 

2.1 EXPLICIT MEASURES 

The class of explicit measures contains the easiest to 
understand and most widely used measures. This class in- 
cludes counts and ratios directly determined from source 
code, staffing records, computer usage logs, and documenta- 
tion. Values of these measures are fixed and unambiguous 
for a given project or component, although there is some 
variability in nomenclature. (Reference 5 provides an ex- 
tensive set of definitions for these measures as well as 
other elements of software engineering.) The following are 
the most important explicit measures: 

• Developed lines of code--All newly developed lines 
of code plus a fraction of reused lines of code; a 
measure of size 
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• Lines of code per staff hour--Lines of code devel- 
oped for each staff hour expended; a measure of 
productivity 

• Errors per thousand lines of code — Errors detected 
for every thousand lines of code developed; a meas- 
ure of reliability 

These are the most widely used measures of software size and 
"quality," probably because not enough is yet understood 
about other measures. The exact hours, lines, and errors 
counted must be defined locally. Table 2-1 lists some other 
measures typical of this class. Walston and Felix (Refer- 
ence 6) studied the relationship of many such measures to 
productivity and reliability. 


Table 2-1. Typical Explicit Measures 


Component 

Problem 


Environment 


Product 


Process 


Measure 


Number of requirements 
Number of interfaces 
Number of functions 

Programming language 
Development machine 
Programmer experience 

Lines of code 
Number of modules 
Pages of documentation 

Staff level 
Development time 
Methodology use 


Although explicit measures are easily determined, they have 
several limitations: they are usually available only after 

the software development activity is complete; their scope 
is limited to the areas of size, productivity, and reliabil- 
ity; they have little explanatory power; and they are not 
sensitive to the specific objectives of a software 
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development project. The next two subsections discuss some 
alternatives and complements to explicit measures that at- 
tempt to counter these weaknesses. 

2.2 ANALYTIC MEASURES 

Analytic measures are based on some assumption or hypothesis 
about the nature of software and the software development 
process. They are intended to be sensitive to defined 
"critical" properties. Examples include cyclomatic complex- 
ity, program length, and reference span (see following sec- 
tions) . The value of these measures depends on the validity 
of the underlying assumption or hypothesis. Validation of 
these hypotheses -is an active area of software engineering 
research. Analytic measures generally deal with one of 
three basic software properties (Reference 7) : program 

size, control structure, or data structure. Each of these 
properties has been studied with several different concep- 
tual approaches. Although researchers frequently disagree 
on the importance of each property and the calculation of 
specific measures of them, some analytic approaches to meas- 
ures have become well established. These approaches are re- 
viewed in the following sections. 

2.2.1 PROGRAM SIZE 

One of the most comprehensive theories and sets of measures 
for software development was proposed by Halstead (Refer- 
ence 8) . This "software science" is a set of relationships 
between the size of a program and other software qualities. 
The essential premise of software science is that any pro- 
gramming task consists of selecting and arranging a finite 
number of program components (operators and operands) . The 
number of these components then determines the implementa- 
tion effort required and the number of errors produced. An 
operator is a symbol denoting an operation, function, or 
action. An operand is a symbol representing a data item or 
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target of the action of an operator. The following basic 
measures are defined by Halstead: 

• Number of unique operators (n^) , e.g., + , IF 

• Number of unique operands (n 2 ) , e *9*/ x r 1/ 200 

• Total number of appearances of operators (N^) 

• Total number of appearances of operands (N 2 ) 

Figure 2-1 shows the source listing of a simple FORTRAN pro- 
gram. Its component operators and operands are identified 
in Figure 2-2. (This identification was done by the Source 
Analyzer Program described in Reference 9.) The values of 
the basic Halstead measures for the sample program are 
n^ = 16, n 2 = 21, N-j^ = 59, and N 2 = 50. These measures 
can be combined to calculate some important software proper- 
ties, as shown in Table 2-2. 

Table 2-2. Software Science Relationships 


Quality 

Vocabulary (n) 
Length (N) 
Volume (V) 
Level (L) 
Effort (E) 
Faults (B) 


Equation 
n = n i + n 2 

N = N x + N 2 
V = N log 2 n 
L = V/V* 

E = V/L 
B = V/S* 


NOTES : V* is the minimum volume represented by a built-in 

function performing the task of the entire program. 

S* is the mean number of mental discriminations 
(decisions) between errors (S* =s 3000) . 


Ostensibly, software science provides equations for estimat- 
ing the cost (effort) and reliability (faults) of developed 
software (see Table 2-2) . These equations are based on as- 
sumptions about the mental process of programming. Although 
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100 
2 OOC 
300 
400 
5 0 OC 
600 
700 
800 
90 OC 
1000 
1 1 OOC 
1200 
1300 
1400 
1500C 
1600 
1700 
1800 
1900 
2000 
2100C 
2200 
2300 
2 4 OOC 
2500 
2600 
2700 
2800 
2900 
3000 
3100 
3200 
3300 
340 OC 
3500 
3600 
3700 
3800 
3900C 
4000 
4100C 
4200 
4300 
4400 
4500 
46 0 OC 
4 70 OC 
4800 
4900 
5000C 
5100 


SUBROUTINE TDIST (N, X, Y, DIST) 
INTEGER N 

REAL X(N), Y(N), DIST 

INTEGER I, MSGNUM, K 

REAL XL, YL, DX, DY, X2, Y2, 

LOGICAL ERR 


REAL SQRT 


XL = 

0.0 

YL = 

0.0 

DIST 

= 0.0 

DO 200 1=1, N 

DX 

= X(I) - XL 

X2 

= DX"DX 

DY 

= Y(I) - YL 

Y 2 

= DY :! DY 

R2 

= X2 + Y2 

CALL VERIFY (R2 

IF 

CERR) THEN 


K = I - 1 

WRITE (6, 100, ERR= 30 0 ) K, I 

100 FORMAT C1X, 'ERROR, POINTS 

« ' TOO CLOSE') 


R 

= 0.0 


ELSE 



R 

= SORT 

(R2D 

END 

IF 


DIST 

= DIST 

+ R 

XL = 

X(I) 


YL = 

Y ( I ) 



200 CONTINUE 
RETURN 

300 CONTINUE 

MSGNUM =27 

CALL ERRMSG (MSGNUM, :; 400) 
RETURN 


400 CONTINUE 

CALL ABORT 

ENO 


PASSED 

LOCAL 
R2 , R 
GLOBAL 
INITIALIZE 

FOR ALL POINTS 

CALC. /CHECK SEPARATION 
OBTAIN SEPARATION 

13, ' AND ', 13, 

ACCUMULATE 

NORMAL RETURN 

ERROR WRITING MESSAGE 

UNABLE TO WRITE ANY 
MESSAGES, ABORT RUN 


Figure 2-1. Sample FORTRAN Program 

ii 
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some early studies supported the validity of software 
science (Reference 10) , recent work has challenged this 
theory on both empirical (Reference 11) and theoretical 
(Reference 12) grounds. 

2.2.2 CONTROL STRUCTURE 

Another well-developed concept of measurement, cyclomatic 
complexity, was introduced by McCabe (Reference 13) in an 
attempt to quantify control flow complexity. The original 
objectives of the measure were to determine the number of 
paths through a program that must be tested to ensure com- 
plete coverage and to rate the difficulty of understanding a 
program. HoweveE, many researchers have attempted to relate 
cyclomatic complexity directly to software reliability. 

The basic measure is the cyclomatic number derived from the 
graphic representation of a program's control flow. Fig- 
ure 2-3 is an example of a graphic representation of the 
control flow of the program shown in Figure 2-1. The 
cyclomatic number of the program represented in the figure 
is equal to the number of disjoint regions defined by the 
edges of the graph, or the number of binary decisions plus 
one. The following is a more general formula: 

V (G) = e - n + 2p 

where V(G) = cyclomatic number of graph G 
e = number of edges 

n = number of nodes 

p = number of unconnected parts 

The cyclomatic number of the graph shown in Figure 2-3 is 4 
(4 = 10 - 8 + 2 x 1) . McCabe suggested that any program (or 
module) with a cyclomatic number greater than 10 is too com- 
plex. Another measurement scheme similarly based on counts 
of decisions was proposed by Gilb (Reference 14) . 
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Figure 2-3. 


Graphic Representation of Cyclomatic Complexity 
(of Program in Figure 2-1) 


McCabe's theory was extended by Myers (Reference 15) to in- 
clude decisions based on compound conditions. Although 
early research (Reference 16) and theoretical consideration 
(Reference 17) gave favorable indications, more recent study 
has not supported the value of cyclomatic complexity as an 
indicator of development effort or reliability. Evangelist 
(Reference 18) suggested that the measure could be reformu- 
lated to more accurately reflect control flow. Hansen (Ref- 
erence 19) has proposed a promising measure incorporating 
both cyclomatic complexity and software science measures 
that has yet to be evaluated. 

2.2.3 DATA STRUCTURE 

The reference span approach to measurement is based on the 
location of data references within a program. The span of 
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reference of a data item is the number of statements between 
successive references to that data item (Figure 2-4) . For 
example, the reference span of variable DIST in the sample 
program (Figure 2-1) is 16. Elshoff (Reference 20) has 
shown that the reference span measure varies widely. He 
also noted its implications for program complexity and read- 
ability. Long distances (reference spans) between occur- 
rences of a variable make a program difficult to understand 
and maintain. According to this theory, minimizing the 
length of reference spans minimizes program complexity. 


STATEMENT 

NUMBER STATEMENT 


SPAN OF 

DATA REFERENCE 


1 


2 

3 

• • • 


27 

28 


65 

66 


X - Y 


Y- Z 


2- X 


SPAN OF 
Y- 25 


SPAN OF 
2-38 

4 


SPAN OF 
X - 63 


CO 

GO 


r>« 

ej 

o» 


Figure 2-4. Graphic Representation of Reference Span 

Other approaches to measuring data structures and data flow 
have been developed recently (Reference 21) . They attempt 
to consider the effects of how data are used and structured 
within a program as well as their volume. The relevant 
measures are, however, frequently difficult to compute, al- 
though useful simplifications can be made. For example. 
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Henry and Kafura (Reference 22) have developed some rela- 
tively straightforward measures based on concepts of in- 
formation flow and system connectivity. 

2. 3 SUBJECTIVE MEASURES 

Subjective measures are so called because they are relative 
ratings of quality rather than absolute measurements. The 
explicit and analytic measures just discussed are absolute 
measures of software properties. Absolute measures are de- 
ficient in that their scope is limited to tangible quanti- 
ties. Consequently, they are not sensitive to the specific 
quality objectives of a software development project. In 
contrast, subjective measures are often used to compare the 
actual realization of a project with its ideal or target 
qualities . 

The greater scope of subjective measures relative to the 
measures discussed in Sections 2.1 and 2.2 is demonstrated 
in Table 2-3. The subjective measures identified in the 
table were proposed by McCall (Reference 23) . Most of these 
measures have no explicit or analytic counterparts. These 
measures are intended to be used to evaluate the performance 
of a software development project relative to specified 
quality targets. The McCall scheme is based on combining 
independent evaluations of multiple criteria to produce a 
value for each measure (or factor, as they are referred to 
by McCall) . 

Although McCall's is the best-known measurement scheme of 
this type, comparable schemes have been proposed by Gilb 
(Reference 14) and the SEL (Reference 4). The McCall meas- 
ures have been extended for use early in the software life 
cycle (Reference 24) , during maintenance (Reference 25) , and 
with distributed systems (Reference 26). Values of sub- 
jective measures are, however, difficult to determine 
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Table 2-3. Definition of Software Quality Factors 3 


Factor 

Correctness 

Reliability 

Efficiency 

Integrity 

Usability 

Maintainability 

Testability 

Flexibility 

Portability 

Reusability 

Interoperability 
a From Reference 


Definition 


Extent to which a program satisfies its 
specifications and fulfills the user's 
mission objectives 

Extent to which a program can be expected 
to perform its intended function with re- 
quired precision 

Amount of computing resources and code 
required by a program to perform a func- 
tion 

Extent to which access to software or 
data by unauthorized persons can be con- 
trolled 

Effort required to learn, operate, pre- 
pare input, and interpret output of a 
program 

Effort required to locate and fix an error 
in an operational program 

Effort required to test a program to en- 
sure that it performs its intended func- 
tion 

Effort required to modify an operational 
program 

Effort required to transfer a program 
from one hardware configuration and/or 
software system environment to another 

Extent to which a program can be used in 
other applications; related to the pack- 
aging and scope of the functions that 
programs perform 

Effort required to couple one system with 
another 


23, Table 3.1-1. 
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consistently. Thus projects, especially those from differ- 
ent environments, cannot easily be compared. 

Another subjective measurement scheme and corresponding de- 
velopment methodology proposed by Myers (Reference 27) in- 
corporates measures of both data and control structure. 

This theory is based on the concept of program modularity. 
Levels of "module strength" and "module coupling" are de- 
fined that correspond to degrees of control cohesion and 
data independence. Table 2-4 explains the levels of module 
strength and coupling. 

Module strength is a measure of "singleness of purpose." A 
module performing only a single function has the greatest 
strength. A module performing several unrelated functions 
has low strength. Module coupling is a measure of "depend- 
ence" between modules. Two modules linked only through data 
passed in a calling sequence have the weakest coupling. The 
use of control flags and COMMON blocks, for example, in- 
creases the level of coupling. Myers suggested that soft- 
ware quality could be improved by maximizing module strength 
and minimizing module coupling. The determination of the 
actual strength and coupling of a module is, however, sub- 
jective, although some progress has recently been made in 
quantifying these concepts (Reference 28) . 
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Table 2-4. Levels of Module Strength and Coupling 


Level 3 

Strength* 3 Functional 

Informational 

Communicational 

Procedural 

Classical 

Logical 

Coincidental 

Coupling 0 Data 

Stamp 

Control 

External 

Common 

Content 


Description 

Single specific function 

Independent functions on com- 
mon data structure 

Multiple sequential functions 
related by data 

Multiple sequential functions 
related by problem 

Multiple sequential functions 

Set of related functions 

No clearly defined function 

Share simple data items 

Share common (local) data 
structure 

Control elements passed 

Reference to global data item 

Reference to global data 
structure (COMMON) 

Direct reference to contents 
of other module 


a Ordered from best to worst. 
^Within a module. 
c Between modules. 
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SECTION 3 - SUMMARY OF SEL RESEARCH 


Although extensive research has been done in the area of 
software measures, much of it has been inconclusive. Many 
studies have had serious methodological flaws, and some im- 
portant ideas have not been tested at all (Reference 29) . 
Thus far, no measures have emerged that are clearly superior 
to lines of code, hours of effort, and errors detected . No 
other measures are widely used because none have been demon- 
strated to be effective in a production environment. This 
is partly due to the lack of relevant data. For example, 
there is no a priori reason to assume that a cyclomatic com- 
plexity of 10 is intrinsically superior to a cyclomatic com- 
plexity of 12. However, few software engineering data bases 
contain the detailed product information necessary to 
determine whether or not this is true. 

Any evaluation of a measure must weigh the information it 
provides about productivity, reliability, and/or maintain- 
ability against its cost of collection. The SEL is conduct- 
ing a continuing program of evaluating and refining existing 
measures and developing new ones. Reference 30 summarizes 
the results of SEL activities in this area. This section 
highlights some of the major findings about each of the 
classes of measures defined in the previous section. 

3.1 EXPLICIT MEASURES 

Early SEL experiments (Reference 31) with explicit measures 
attempted to verify the work of Walston and Felix (Refer- 
ence 6) . Although similar results were obtained, some 
important differences were noted. Table 3-1 compares the 
Walston-Felix data with SEL data. Differences between the 
data bases reflect the differences between the environments 
studied. Both data bases, however, showed consistent 
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Table 3-1. Comparison of Walston-Felix Data With SEL Data a 


Measures 

Walston- 
Felix Median 

SEL 

Median 

Total Source Lines (thousands) 

20 

49 b 

Percent of Lines Not Delivered 

5 

0 

Source Lines per Staff-Month 

274 

601 b 

Documentation (pages) per Thousand 

69 

26 

Lines 

Total Effort (staff-months) 

67 

96 

Average Staffing Level 

6 

5 

Duration (months) 

11 

15 

Distribution of Effort 

Manager 

22 c 

19 

Programmer 

73 c 

6 8 

Other 

5 C 

13 

Errors per Thousand Lines 

1.4 

0.8 


a From Reference 6 , Table A-9. 
b Lines are developed lines of code. 
c Rescaled to sum to 100 percent 
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relationships among lines of code, pages of documentation, 
project duration, staff size, and programmer hours. 

Table 3-2 shows the numerical relationships among these 
measures identified by the SEL and Walston-Felix. It should 
be noted that the coefficients and exponents in each pair of 
equations are of the same magnitude. This close agreement 
obtained from the analysis of two very different sets of 
data suggests that these relationships among explicit meas- 
ures do indeed reflect basic properties of the software de- 
velopment process. 

Table 3-2. Software Relationship Equations 3 


Software Engineering 

Laboratory Walston-Felix 


Measure 


Equation 

CD C 


Equation 0 

CD C 

Effort (E) 
(staf f-months) 

E 

= 1.4L 0 - 93 

0.93 

E 

= 5.2 L 0- 9 1 

0.64 

Documentation (D) 
(pages) 

D 

= 30L 0 * 90 

0.92 

D 

= 49l1*01 

0.62 

Duration (T) 
(months) 

T 

= 4 . 6L° * 26 

0.55 

T 

= 4.1L 0 * 36 

0.41 

Staff size (S) 

S 

= 0. 24E 0, 73 

0.89 

S 

= 0. 54E°* 60 

0.79 


(average persons) 


a From Reference 31, Table 1. 

kin following equations L is total lines of code. 
c Coef f icient of determination (or r 3 ) . 

The SEL has achieved some success in applying explicit meas- 
ures to cost estimation. Analysis of the relationships 
among productivity, lines of code, and other cost factors 
provided the empirical basis of the SEL Meta-Model for soft- 
ware cost (Reference 32) . One of the liabilities of a model 
based on lines of code is that this quantity is known 
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accurately only after software development is complete. 
Reasonable early estimates can, however, be made based on 
other explicit measures, such as the number of subsystems or 
modules (Reference 33) . In summary, the SEL has found ex- 
plicit measures to be very effective for some software de- 
velopment characteristics . 

3.2 ANALYTIC MEASURES 

SEL studies of analytic measures have focused on program 
size and control structure. Measures of data structure are 
still in the early stages of investigation. The Halstead 
and McCabe measures (see Section 2.2) have been carefully 
examined by the SEL (References 34 and 35) . Table 3-3 sum- 
marizes the relationship between several measures and pro- 
ductivity and reliability. As shown in the table, neither 
cyclomatic complexity nor Halstead effort was the best pre- 
dictor of either productivity or reliability. 

A recent SEL study (Reference 36) showed that higher corre- 
lations for cyclomatic complexity and Halstead effort could 
be obtained after carefully screening the data to ensure 
sample consistency. This suggests that the minute level of 
detail of these measures (operators, decisions, etc.) makes 
them sensitive to extraneous variations in the data collec- 
tion process and programming style. They are thus unsuit- 
able for use in production environments where extensive data 
verification is not possible. 

Another approach to validating (Halstead) software science 
measures has been to show that they are internally consist- 
ent (Reference 34) . For example, good agreement between the 
program length as predicted by software science and the ac- 
tual program length has been taken as evidence of the valid- 
ity of software science. Table 3-4 gives results from this 
type of analysis. 
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Table 3-3. Predicting Effort and Errors Using Size and 
Complexity Metrics 9 


Correlation 


Measure 

Effort 

Errors 

Calls 

0.80 

0.57 

Cyclomatic Complexity 

0.74 

0.56 

Calls + Jumps 

0.80 

0.58 

Lines of Code 

0.76 

0.56 

Executable Statements 

0.74 

0.55 

Revisions 

0.71 

0.67 

Halstead Effort 

0.66 

0.54 


a From Reference 30, page 18, based on 
Table 3-4. Internal Validation of 

a study of SEL data. 

Halstead Measures 9 
Correlation 

Relationship 


r b 

Large 

Small c 

N -v N 


0.79 

0.83 

V 'v V* 


0. 52 

0. 50 

L -v L 


0.71 

0.62 

E -v E 


0.61 

0.42 

a From Reference 

30, page 

20, based on 

a study of SEL data. 


b Modules > 50 lines of code. 
c Modules < 50 lines of code. 


3-5 


9274 


The correlations reported in the table cannot, however, be 
taken at face value. For any given program studied, values 
A and B can be found so that the total number of operators 
and operands can be expressed as functions of the number of 
unique operators and operands. Consider the following 
equations : 


N 1 = n l A (1) 

N 2 = n 2 B (2) 

N = n, A + n 2 B (3) 

and, according to Halstead (Reference 8) : 

N = n 1 log 2 n.^ + n 2 log 2 n 2 (4) 

✓\ 

where N = predicted program length 

N = actual program length (N^ + N 2 ) 

= total number of operators 
N 2 = total number of operands 
n^ = number of unique operators 
n 2 = number of unique operands 
A, B = constants 

A 

Comparing Equations (3) and (4) shows that N and N are both 
functions of n^ and n 2 « Because the coefficients A, B, 
l°g 2 n i , and log 2 n 2 are all always positive, a positive cor- 
relation must exist between N and N. The correlations shown 
in Table 3-4 may not be significant after accounting for the 
fact that all these quantities are functions of n^ and n 2 * 
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McCabe measures have also been the focus of extensive inves- 
tigation by the SEL. Although McCabe made no such claims 
for his theory, others have attempted to relate cyclomatic 
complexity to error rate. As reported in Section 2.2.2, 
this has been only partially successful. The SEL (Refer- 
ence 37) has found evidence that cyclomatic complexity and 
error rate may be uncorrelated or even negatively corre- 
lated— not a very satisfying conclusion. The position of 
the SEL is that, although analytic measures seem promising 
and are intellectually appealing, their practical value has 
not been demonstrated . 

3.3 SUBJECTIVE MEASURES 

The evaluation of software quality is, at present, a matter 
of the subjective interpretation of the results of a soft- 
ware development project relative to its functional require- 
ments. This can be done best by managers and senior 
personnel associated with the project. Discussions among 
these individuals produce a consensus rating of the project 
relative to projects previously undertaken by the organiza- 
tion. Checklists and questionnaires can be employed to 
formalize the rating process. 

Subjective measures offer the flexibility of easy tailoring 
to any situation. Hundreds of such measures have been sug- 
gested (see Section 2.3); however, the selection of appro- 
priate measures is essential to systematizing the subjective 
process. Because each environment and application is 
unique, a single set of measures may not be appropriate to 
all. 

The SEL conducted an exhaustive study to determine the meas- 
ures that best characterize the flight dynamics environment 
(Reference 4) . Over 600 measures were examined, from which 
38 key properties (factors) were identified. Table 3-5 sum- 
marizes these results. The factors defined by the analysis 
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Table 3-5. Summary of Factor Analyses of Classes of 
Measures 3 


Class of Measures 

No . of 
Measures 

No. of 
Factors 

Percent of 
Variance 
Explained 

Software Engineering 
Practices 

43 

5 

80 

Development Team 
Ability 

110 

6 

82 

Difficulty of Project 

54 

5 

74 

Process and Product 
Characteristics 

47 

5 

85 

Development Team 
Background 

144 

5 

86 

Resource Model 
Parameters 

73 

6 

73 

Additional Detail 

137 

6 

83 


a Fr om Reference 4, Table 4-2, based on a study of 20 flight 
dynamics projects from the SEL data base. 
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contained about 80 percent of the information (variance) of 
the original measures. This study showed that a concise set 
of subjective measures can be devised that effectively char- 
acterizes a given software development environment. One 
conclusion of this study was that project size influenced 
almost all aspects of software development, including staff- 
ing, methodology, and stability. Subsequent research ef- 
forts will define the relationship of these characteristics 
to the quality of the final software product. The general 
conclusion of the SEL is that, although subjective measures 
lack the precision and conciseness of explicit and analytic 
measures, they are an effective means of characterizing 
software quality . 
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SECTION 4 - APPLICATIONS OF MEASURES 


Measures play many roles in the management of a software 
development project. These roles are illustrated at a high 
level in Figure 4-1. Measures provide the basis for manage- 
ment decisions about project control. The general roles of 
measures include the following: 

• Predicting and planning--Estimating cost and qual- 
ity to establish a baseline or development plan 

• Reviewing and assessing--Measur ing performance and 
quality during development 

• Evaluating and selecting the best technology for an 
ongoing or future project 

The goal of measurement is to detect significant departures 
from historical patterns. The regular and consistent ap- 
plication of measures will enable the software development 
manager to prevent or correct problems quickly and effi- 
ciently. This section describes the uses of some specific 
measures. More general guidelines for software development 
management are given in Reference 38. 

The intent of this section is to demonstrate how measures 
can be used to answer some of the most common management 
questions. The recommendations presented here are not in- 
tended to constitute a complete or final guide to the uses 
of software measures; substantial improvements in this area 
will be forthcoming as additional research is performed. 
However, as Gilb (Reference 14) suggests, the currently 
available (if imperfect) measures should be used until 
others are developed. These specific measures have been 
used successfully in a software production environment. 
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Figure 4-1. Role of Measures in Software Development 
Management 


4-2 






4.1 PREDICTION AND PLANNING 


Effective software development management depends on reli- 
able measurement and accurate estimation. The manager must 
produce initial estimates of system size, cost, and com- 
pletion date. These estimates are incorporated in a de- 
velopment/management plan. Progress toward completion is 
measured against the plan. Plans and estimates must be 
updated periodically to reflect actual work accomplished. 

This section shows how measures can be used to answer the 
manager's questions about the ultimate size, development 
cost, schedule, maintenance cost, and reliability of a soft- 
ware system. More detailed explanations of planning and 
estimation are presented in References 38 and 39, 
respectively. 

How big will this sys- • Number or subsystems 

tern be when finished? • Number of modules 

• Lines of code per subsystem 

• Lines of code per module 

• Lines of code developed to date 

• Current growth rate 

An initial estimate of system size can be made by multiply- 
ing the number of subsystems by the average number of lines 
of code per subsystem. Once the high-level design is com- 
plete, an estimate can be made similarly by using the number 
of modules and the average lines of code per module. 

Table 4-1 lists values for these measures derived from SEL 
data. During implementation, the lines of code developed to 
date and the current growth rate can be combined to project 
the size of the completed system. 

How much will this system • Number or subsystems 

cost to develop? • Number of modules 

• Hours per subsystem 

• Hours per module 

• Percent of reused code 

• Expenditures to date 

• Life cycle effort model 
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Table 4-1. Basic Estimation Parameters 3 


Requirements Analysis* 3 

Nominal Value 

Size : 

Lines of code per subsystem 

7500 

Cost : 

Hours per subsystem 

1850 

Schedule : 

Weeks per subsystem per person 

45 

Preliminary 

Design 


S ize : 

Lines of code per module 

125 

Cost : 

Hours per module 

30 

Schedule : 

Weeks per module per person 

0.75 

b 

Detailed Design 0 


Size : 

Relative weight of reused^ code 

0.2 

Cost : 

Hours per developed line of code 

0.3 

Schedule : 

Weeks per developed modules per 
person 

1.0 

Implementation 


S ize : 

Percent growth during testing 

10 

Cost: 

Testing percent of total effort 

25 

Schedule : 

Testing percent of total schedule 

30 

System Testing 


Cost : 

Acceptance testing percent of 
total effort 

5 

Schedule : 

Acceptance testing percent of 
total schedule 

10 

a At end of 

each phase, based on SEL data. 


^Estimates 

of totals, not required to complete 

• 


c Based on data collected in the flight dynamics 
environment. 

^Does not include extensively modified reused module. 
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An initial estimate of system cost can be made by multiply- 
ing the number of subsystems by the average hours per sub- 
system. Once the high-level design is complete, an estimate 
can be made similarly by using the number of modules and the 
average hours per module. As modules that can be reused 
from other systems are identified, this estimate can be 
refined. Table 4-1 lists values for these measures derived 
from SEL data. 

During development, the expenditures to date can be compared 
with the life cycle effort model (Table 4-2) to project the 
cost to complete the system. As shown in Table 4-2, the 
proportion of the total activity required for each life 
cycle phase is relatively stable. Thus, the actual expend- 
itures to date at the end of any phase can be assumed to 
represent the corresponding percentage of the total expend- 
itures required to complete development. 

When will this system • Weeks per subsystem per person 

be completed? • Weeks per module per person 

• Life cycle schedule model 

• Time elapsed to date 

• Modules per week 

• Lines of code per week 

• Modules per subsystem 

• Lines of code per module 

An initial estimate of development time can be made by mul- 
tiplying the number of weeks required per subsystem per per- 
son by the number of subsystems, then dividing by the 
projected staff level (number of persons) . Once the high- 
level design is complete, an estimate can be made similarly 
by using the number of weeks required per module per person 
and the number of modules. Table 4-1 lists values for these 
measures derived from SEL data. 

During development, the time elapsed to date can be compared 
with the life cycle schedule model (Table 4-2) to project 
the time required to complete the system. As shown in 
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Table 4-2, the proportion of the total time required for 
each life cycle phase is relatively stable. Thus, the time 
elapsed to date at the end of any phase can be assumed to 
represent the corresponding percentage of the total time 
required to complete development. 


Table 4-2. Life Cycle Effort/Schedule Model 9 


Percent of 

Life Cycle Phase Total Schedule 


Percent of 
Total Effort 


Requirements Analysis 
Preliminary Design 
Detailed Design 
Implementation 
System Testing 
Acceptance Testing 


5 

6 

10 

8 

15 

16 

40 

45 

20 

20 

10 

5 


a Based on SEL experience. 


The completion time for detailed design can be estimated by 
dividing the number of modules remaining to be designed by 
the current module design rate (modules per week) . The com- 
pletion time for implementation can be estimated by dividing 
the number of lines of code remaining to be produced by the 
current software production rate (lines of code per week) . 


Is this project on 
schedule? 


• Stability of plans and staff 

• Computer utilization 

• Software production 

• Staffing expenditures 


Periodic reestimation of software size, cost, and schedule 
can necessitate changes in the development plan and person- 
nel. Many such changes, however, can indicate that the 
development team does not have a good grasp of the software 
problem and is likely to fall farther behind in the future. 
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Finishing a project on time depends on the development 
team's doing the right thing, at the right time, fast enough 
to stay on schedule. Measurements of computer utilization 
and software production can suggest the type of activities 
the development team is engaged in and how fast they are 
working. Comparison of actual staff hours expended to date 
with planned expenditures at this date can indicate whether 
or not work is proceeding according to schedule. 

Software production usually progresses at a constant rate 
throughout implementation, as shown in Figure 4-2. Plotting 
the percent completed of the total software estimated 
against the percent of the schedule elapsed indicates proj- 
ect status. A development project that starts producing 
code before the expected start of implementation may be 
working from an inadequate design. Too rapid code produc- 
tion during implementation suggests that inadequate unit 
testing is being performed. Slow code production results in 
the project falling behind schedule. Figure 4-2 identifies 
the regions of the software production graph associated with 
these problems. 

Computer utilization follows software production, increasing 
constantly during implementation. Computer utilization 
should, however, stabilize and then fall rapidly during 
testing as tests are completed. Figure 4-3 shows this pat- 
tern. Significant computer use during design (unless auto- 
mated design tools are used) suggests that coding has 
actually started too early. A low level of computer use 
during implementation indicates that code production and/or 
unit testing is behind schedule. A decline in computer 
utilization at any time during implementation is a sign that 
development has been interrupted. Any of these problems may 
result in an integration crunch during testing when re- 
sources are added to the project in an effort to complete 


4-7 


9274 
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PERCENT OF ESTIMATED SCHEDULE ELAPSED 

Figure 4-2. Nominal Software Production Pattern 
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Figure 4-3. Nominal Computer Utilization Pattern 


it on schedule. Figure 4-3 identifies the regions of the 
computer utilization graph associated with these phenomena. 

How much will this system • Development cost 

cost to maintain? • Percent of reused code 

~ : • Implementation error rate 

• Testing error rate 

• Effort to change 

The annual cost of software maintenance is about 25 percent 
of the development cost (Reference 39). However, any given 
project may cost more or less than that to maintain. A high 
error rate (errors per thousand lines of code) during imple- 
mentation and/or testing suggests that a system will cost 
more than usual to maintain. The effort to change (hours 
per change) measured during development is another indicator 
of relative maintenance cost. 

How reliable will this • Implementation error rate 

software be? • Testing error rate 

• Software change rate 

• Number of requirements changes 

The implementation and testing error rates (errors per 
thousand lines of code) provide the first indications of the 
reliability of the delivered software product. During test- 
ing, the error rate should peak and begin declining. Fail- 
ure of the error rate to decline during testing suggests 
that many undiscovered errors remain in the software. Late 
requirements changes can also introduce errors and incon- 
sistencies into the system. Some of these effects can be 
traced in the software change rate. 

The software change rate cannot be measured until implemen- 
tation, when software production begins. Development tech- 
niques such as configuration control and online development 
affect the overall change rate. The cumulative change rate 
should, however, increase steadily throughout implementation 
and testing as shown in Figure 4-4. A static (level) change 
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rate indicates that testing and error correction are pro- 
ceeding too slowly. A rapid increase in the cumulative 
change rate suggests that the software is unstable; that the 
developers are making ill-considered and possibly contra- 
dictory changes , perhaps in response to sudden requirements 
changes. Figure 4-4 identifies the regions of the software 
change graph associated with these problems. 

4.2 REVIEW AND ASSESSMENT 

Throughout the software life cycle/ the development team 
produces and delivers products that eventually make up the 
completed software system. The manager must evaluate each 
of these products as well as the team's overall performance. 
This section addresses those questions that can be asked 
about the quality of requirements, design, software, test- 
ing, documentation, and performance. Table 4-3 lists nomi- 
nal values for some applicable measures based on SEL data. 
The specific uses of these measures are discussed below. 

Are the requirements • To be determined items (TBDs) 

complete? • TBD rate 

• Severity of TBDs 

Although there is no set of measures that answers this ques- 
tion directly, experience shows that the number and type of 
"to be determined" items (TBDs) , as well as the rate of 
change in the TBDs are very strong indicators of the com- 
pleteness of requirements. Any set of requirements will 
contain some TBDs, but an excessive amount can indicate 
trouble. An increase in the number of TBDs near the time 
when requirements are due to be completed is an even 
stronger indication that more work must be done before pre- 
liminary design can begin. Such an increase is interpreted 
as a sign that more weaknesses in the requirements are un- 
covered as they are looked at more closely. An assessment 
of the completeness of requirements must also incorporate 
the severity of TBDs. TBDs in specific algorithms and 
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Table 4-3. Measures for Assessment 


Product 


Measure 


Nominal 3 

Value 


Requirements 


Design 


Sof tware b 


Testing 


Documentation 


Performance 


TBDs per subsystem 3 

Questions per subsystem 8 

ECRs per subsystem 5 

Internal interfaces not defined (%) 5 

External interfaces not defined (%) 5 

Modules not defined (%) 5 

Errors per thousand developed 7 

lines of code 

Changes per thousand developed 14 

lines of code 

Effort to repair (hours) 8 

Effort to change (hours) 8 

Modules affected per change 1 

Module coverage (%) 100 

Function coverage (%) 100 

Errors per thousand developed 3 

lines of code 

Pages per module 2 

Checklist completeness (%) 100 

Developed lines of code per staff 
hour 3 

Schedule changes 5 C 

Reused code (%) 30 

Estimate Changes 5 C 


a Based on SEL historical data for flight dynamics software. 
b Measured during implementation. 
c Once per phase and build. 
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tolerances, for example, are much less critical than TBDs in 
external interface formats and operational constraints 
(e.g., memory, timing, data rate). 

Are the require- • To be determined items (TBDs) 

ments accurate? • Engineering change requests (ECRs) 

Much effort has been expended by research organizations 
(including the SEL) to develop specific approaches and meas- 
ures for assessing the completeness and accuracy of software 
requirements. Success to date has been very limited. Ap- 
proaches such as traceability matrices, requirements lan- 
guages, and cross-check tables have not been fully effective. 

At this time, two of the more reliable measures for deter- 
mining the accuracy of software requirements are the number 
of TBDs listed in the requirements and the number of engi- 
neering change requests (ECRs) generated during the require- 
ments analysis phase. Exceptionally large values for these 
parameters can indicate that the requirements need to be 
redeveloped. 

Is the design • Number of modules not identified 

complete? • Number of modules not defined 

• Number of module interfaces not defined 

• Number of external interfaces not defined 

Although the definition of the design activity and corre- 
sponding criteria can vary from environment to environment, 
the informational content of a complete design is relatively 
standard. Four basic measures of design completeness are 
generally applicable. The structure chart must identify all 
modules (software items) to be produced. Processing de- 
scriptions (PDL or prologs) must be provided for all mod- 
ules. Interfaces among modules (e.g., calling sequences and 
COMMON blocks) must be defined. All external interfaces 
must be defined to the bit level. It is not always possible 
to specify all of these items before starting implementa- 
tion. However, counting the number of TBDs in each area 

4-14 


9274 



provides a good measure of design completeness. More than 
5 percent TBDs in any area is an indication that the design 
is not ready and implementation should be postponed. 

Is the design effective • Module strength 

(relatively the best)? • Module coupling 

• External I/O isolation 

No reliable objective measures of design quality have been 
identified. However, three subjective measures have been 
found to be useful in this context. These measures can be 
determined only by inspecting the module process descrip- 
tions, although efforts continue to develop corresponding 
automatable measures. High module strength (singleness of 
purpose) produces a relatively high-quality design when 
maintainability and robustness are concerns. Another rele- 
vant measure is module coupling (interdependence) . Many 
interdependencies make changes to the software difficult and 
error prone. The degree of external I/O isolation is the 
number of modules accessing external files. Ideally, only 
one module should access each file. Failure to isolate 
external I/O activities often leads to lower reliability. 

Is the software too complex • Effort to change 

(or is it modular)? • Effort to repair 

• Modules affected per change 

• Module strength 

• Module coupling 

Although numerous analytic measures have been proposed as 
straightforward means of determining the complexity of soft- 
ware, the SEL has been unable to verify their effectiveness 
in this application (see Section 3.2). Measures that have 
not proved effective include module size (average lines of 
code per module) , cyclomatic (and central) complexity, and 
Halstead measures. Many successful software developers 
believe that smaller modules are generally less complex than 
larger ones, and therefore better. SEL research has not. 
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however, shown any direct correlation between module size 
and cost or reliability (Reference 37) . 

Based on SEL experience, the most effective measures of 
software complexity and modularity are the effort required 
to make a change, the effort required to repair an error, 
and the number of modules affected by a change. Simple 
modular development will minimize these quantities. Values 
for these measures can be determined during implementation 
by monitoring changes and errors. The most reliable measure 
of software complexity and modularity, however, is the judg- 
ment of an experienced software development manager. This 
judgment may be based on an assessment of module strength 
(singleness of purpose) and module coupling (interdepend- 
ence) as well as knowledge of the application area and simi- 
lar systems. 

Is the software maintain- • Effort to change 

able? • Effort to repair 

• Modules affected per change 

• Errors outstanding 

Because software maintenance is often the most expensive 
phase of the software life cycle, it is important that the 
completed software be easy to maintain. Low complexity and 
good modularity facilitate maintenance, so some of the rel- 
evant measures are the same. Effort required to make a 
change, effort required to repair an error, and number of 
modules affected by a change are good indicators of the rel- 
ative difficulty and cost of software maintenance. Lower 
values for these measures imply better maintainability. 

Another useful measure is the number of errors outstanding. 
Errors are discovered throughout the software life cycle 
and, after some delay, are repaired. When the rate of 
discovery exceeds the rate of repair during maintenance, it 
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may be time to redevelop or replace the software. This 
measure may indicate the end of the software life cycle. 

Is the software reliable? • Error rate 

• Change rate 

• Modules affected per chang 


The basic measure of software reliability is how often the 
software fails. It depends on the number and severity of 
errors. However, other indicators are also important in 
deciding how much confidence can be placed in a software 
system. The three measures relied on by the SEL are as 
follows : 


1. Errors' per thousand lines of code--This quantity, 
measured during system and acceptance testing, can 
be compared with values from previous systems to 
determine relative reliability. An error rate in 
excess of 3 per 1000 developed lines identifies an 
unreliable system. Furthermore, any increase in 
the error rate late in development indicates a 
problem with system reliability. 

2. Changes per thousand lines of code--This quantity, 
measured during system and acceptance testing, can 
identify reliability problems. Although some 
changes may be requirements changes or clarifica- 
tions, a high change rate usually indicates future 
unreliability. 

3. Number of modules affected per change--Highly 
coupled software tends to propagate errors and 
confound change attempts. A high value for this 
measure indicates that maintenance will be diffi- 
cult and reliability will be low. 

Although many comprehensive reliability models have been 
developed and occasionally successfully applied, SEL 
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experience (Reference 37) suggests that they are not very 
effective in the Flight Dynamics environment. 

Is system testing complete • Function coverage 

(or adequate) ? • Module coverage 

• Error discovery rate 

The two basic approaches to software testing are functional 
and structural. Functional testing attempts to maximize the 
number of functional capabilities tested based on the de- 
scription of functionality contained in the requirements 
specification. Structural testing attempts to maximize the 
number of software structures tested without regard for 
functionality. Approximate measures corresponding to these 
approaches are function coverage and module coverage. 

Function coverage is the percentage of functions identified 
in the requirements that are exercised during system and 
acceptance testing. Module coverage is the percentage of 
modules and other software components that are exercised 
during system and acceptance testing. An effective test 
plan will exercise 100 percent of the functions and modules. 
It is not, however, generally possible to test every line of 
code or every path through the system. SEL research indi- 
cates that good functional testing may exercise only 70 per- 
cent of the code, but that has proven to be adequate 
(Reference 40) . The number of individual tests defined in 
the test plan is not a good measure of test completeness. 

The error discovery rate can also indicate when sufficient 
testing has been done. Failure of this rate to decline 
toward the end of planned testing suggests that more testing 
needs to be done. The error discovery rate during mainte- 
nance and operation will be the same as at the end of test- 
ing unless additional effort is expended to find and correct 
errors. 
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Is the documentation appropriate? • Pages per module 

“ • Checklist completenes 

• Subjective assessment 

• Expected lifetime 

Much useful documentation is generated as an intrinsic part 
of the software development process (e.g. f design descrip- 
tion). However, any nontrivial system will require addi- 
tional documentation to support users after development is 
complete. This documentation must support both maintenance 
(by programmers) and operation (by users) . In the flight 
dynamics environment studied by the SEL, this information is 
frequently presented in two separate documents, a system 
description and a user’s guide. 

The amount and formality of documentation required depends 
on the size and expected lifetime of the system. Generally, 
about two pages of documentation per module should be pro- 
duced. Excessive documentation can be as awkward as in- 
sufficient documentation. Long-lived systems need more 
detailed and formal documentation. Short-lived systems need 
only minimal documentation. Document completeness can be 
determined by comparison with a checklist of standard con- 
tents. Realistically, document quality can only be deter- 
mined by a subjective assessment. 

Is the product cost effective? • Productivity 

• Reused code 

• Error rate 

• Effort to change 

• Effort to repair 

The cost effectiveness of a product is a function of its 
initial cost to develop and subsequent cost to maintain. 
Although often criticized as an inadequate measure of pro- 
ductivity, SEL experience indicates that the measure "lines 
of code developed per staff hour expended" is a reliable way 
of evaluating the cost effectiveness of development when 
consistent historical data are available for comparison. 
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Lines of code and hours charged must, however, be clearly 
defined. Another measure of the cost effectiveness of de- 
velopment is the percentage of reused code in the new sys- 
tem. Reusing previously developed code costs only 
20 percent as much as developing new code. 

The error rate (errors per thousand lines of code) , effort 
required to make a change, and effort required to repair an 
error, are good indicators of maintenance cost. A system 
that has few errors and that is easily modified will be in- 
expensive to maintain. Low maintenance and development 
costs characterize a cost-effective product. 

Team performance during development can be monitored by 
plotting cumulative productivity. The starting point of the 
cumulative productivity graph depends on the amount of re- 
used code. Figure 4-5 shows the productivity pattern for a 
project reusing up to 15 percent of code. In the figure, 
productivity increases steadily throughout implementation. 

A very rapid increase in productivity suggests that software 
is being developed without adequate unit testing. Too slow 
an increase implies that development is falling behind 
schedule. 

Extensive reuse of existing code raises the starting level 
of cumulative productivity, and thus its path may be level 
or even declining during implementation. In all cases, 
however, productivity should be level or should decline 
slightly during testing. If productivity continues to 
increase instead, implementation is not complete; coding is 
still in progress. A sharp decline in productivity during 
testing reflects an integration crunch when resources are 
added to the project in an effort to complete it on sched- 
ule. Figure 4-5 identifies the regions of the cumulative 
productivity graph associated with these phenomena. 
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TOTAL LINES OF CODE PER STAFFING HOUR 



Figure 4-5. Nominal Productivity Pattern 
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Is staffing at the right level? • Staffing profile 

• Schedule changes 

• Estimate changes 

• Code production rate 

Two major concerns of a software development manager are the 
team's size and its capabilities. The first concern is 
usually whether the team size is optimum — neither too many 
nor too few members. Two effective measures in this regard 
are the size of departures from the planned staffing profile 
(perhaps a Rayleigh curve) and the frequency of schedule and 
milestone changes. 

Departures from planned staffing usually indicate that pro- 
duction will depart from plan, too. Unless major changes 
are made to the system requirements, schedules and mile- 
stones should be adjusted only once at the end of each life 
cycle phase. Frequent changes in size and cost estimates 
can imply that the estimates are being adjusted to fit an 
inappropriate staffing level or that the skill mix of the 
development team is inappropriate to the task. 

During implementation, the code production rate (lines of 
code per month) can be used to project the completion date 
of development (see Section 4.1). Staff can be added or 
subtracted to make that date match the schedule. Rarely 
does individual productivity change during development, so 
the manager should not expect to change the team production 
rate except by altering the staff level and/or skill mix. 

4.3 EVALUATION AND SELECTION 

During the software development process as well as during 
predevelopment planning, the manager must select the tools, 
practices, and techniques most appropriate to the specific 
software development project in progress. Measures facili- 
tate the comparison of the ongoing project with previous 
projects and highlight any special considerations. The 
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process of evaluating the overall cost effectiveness of in 
dividual technologies is not considered here. 


This section addresses the use of measures to evaluate and 
select appropriate strategies for a planned or ongoing soft 
ware development project. The major areas of concern are 
development methodology, testing approach, team organiza- 
tion, and level of standards. 


Which methodology is appropriate? 


• Percent reused code 

• Number of external 
files 

• Similarity to past 
projects 

• Team experience 

• Size of project 


A methodology can consist of one or more integrated tools, 
techniques, and practices. Methodologies provide the devel- 
opment team with a common form of communication and organize 
its activities into integrated cooperative subactivities. 
Many different methodologies are used in the software engi- 
neering community. The general class of "structured" tech- 
niques is probably the most widely employed. SEL experience 
with over 40 flight dynamics projects has identified five 
principal measures relevant to selecting an appropriate 
software development methodology. 


The percent of reused code is an important consideration 
when deciding that top-down design, coding, and testing are 
to be used. The SEL has found the strict application of 
"top-down" techniques to be less effective as the percent of 
reused code (or design) increases. 

Another relevant parameter is the number of external files 
defined for the project. A large number of external files 
indicates that the software is "data processing" rather than 
"computational." The use of a structured design or struc- 
tured analysis methodology has been found to be more valu- 
able with data processing systems. 
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Another consideration is the similarity to past projects. 

The SEL has found that, as the similarity to past projects 
increases, the need for a completely structured development 
methodology decreases. New project types require a much 
more disciplined development approach. 

A fourth useful measure is the relative experience of the 
development team. Although an experienced team does not 
always use a very structured and disciplined methodology, 
the SEL has found that an experienced team will automati- 
cally select an effective approach. On the other hand, a 
less experienced team should employ a single well-defined 
(typically structured) methodology. 

Project size is also a major consideration when selecting a 
methodology. The manager will find that less formal, less 
structured methodologies are very workable for smaller proj- 
ects (e.g., less than 2 or 3 staff-years). However, larger 
projects (especially those greater than 5 or 6 staff-years 
of effort) need the discipline of a structured approach. 

What testing approach should • Size of project 

be employed? • Percent reused code 

• Reliability 
requirements 

Software testing and verification can consume the major 
portion of development resources. Consequently, a testing 
strategy must be selected with care. There are two general 
approaches to testing software: functional and structural 

(see Section 4.2). These can be implemented by an independ- 
ent test team or the development team. The independent test 
team often assumes a verification role early in the develop- 
ment process, in which case it is referred to as an inde- 
pendent verification and validation (IV&V) team. Although 
the extent of testing is obviously a function of the soft- 
ware reliability requirements, several other measures also 
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may help in selecting the most appropriate approach to sys- 
tem testing. 

Size and required reliability are the important determinants 
of whether or not the IV&V approach is worthwhile. For 
projects with average reliability requirements, IV&V is ef- 
fective only for those projects greater than 20 staff-years 
of effort. IV&V is cost effective, however, for any project 
with an exceptionally high reliability requirement. 


Although the projects studied by the SEL produced generally 
successful (reliable) software with the application of func- 
tional testing, some experiments indicated that for unusu- 
ally high reliability requirements the structural (statement 
and path coverage) approach may be more appropriate. The 
selection of testing approach also depends on the percent of 
reused code. Above 30 percent, functional testing seems to 
be fully adequate. 


What team organization is 
appropriate? 


• Size of project 

• Team experience 

• Similarity to past 
projects 

• Percent reused code 


There are many general structures into which a software de- 
velopment project can be organized. The most common organi- 
zation is the chief programmer team (CPT) . In addition, the 
project can be subdivided into functional teams (e.g., qual- 
ity assurance) . The principal alternatives to CPT are fluid 
organizations such as the democratic team. The best organi- 
zation for a given project depends on a number of factors. 
The most important of which is that the smaller the team, 
the better (Reference 41) . Whenever possible, every team 
member should be assigned full time. 

Project size is the principal criterion for deciding whether 
or not separate quality assurance or configuration control 
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teams are called for. If a project is less than 12 to 15 
staff-years, it probably will not be cost effective to or- 
ganize it into separate groups with these responsibilities. 

Team experience is the principal criterion when deciding 
whether or not a CPT should be applied. SEL experience in- 
dicates that projects with very experienced personnel have 
not derived any benefit from CPT. On the other hand, teams 
with average or less than average experience with the speci- 
fic application can benefit from CPT. However, the success- 
ful use of CPT requires an application expert with a natural 
capability for the chief programmer role. 

Two other considerations when selecting the team structure 
are the similarity to past projects and percent of reused 
code. SEL experience shows that as these measures increase 
(higher similarity and higher percent of reused code) , the 
need for the CPT organization and the need for independent 
functional organizations responsible for quality assurance 
and configuration control decrease. 

What type and levels of standards • Size of project 

should be applied? • Schedule changes 

• Change rate 

• Error rate 

• Similarity to past 
projects 

• Percent reused code 

Whether they are called standards, guidelines, policies, or 
something else, some such set of written development prac- 
tices must be prescribed for every project. SEL experience 
with flight dynamics projects shows that, as projects in- 
crease in size, the need for design, coding, and implemen- 
tation standards also increases. Projects of less than 
2 staff-years can be completed quite satisfactorily with 
minimum written standards. 

Three other measures can indicate a need for a change in the 
level of standards during development. If the error rate, 
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change rate, or frequency of schedule changes increases, the 
manager should reconsider the level and type of development 
standards being applied. Policies should be revised or en- 
forced more strongly if these measures indicate problems. 
Finally, projects with a high percent of reused code and 
high similarity to a past project often benefit from a flex- 
ible set of design, code, and test standards. 
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SECTION 5 - CONCLUSIONS 


The preceding sections showed that a wide range of software 
development measures are available to the software practi- 
tioner. Some have important applications. The SEL has ar- 
rived at the following general conclusions: 

• Explicit measures are very effective for some soft- 
ware development characteristics. 

• Although analytic measures seem promising and are 
intellectually appealing, their practical value has 
not been demonstrated. 

• Subjective measures , are an effective means of char- 
acterizing software quality. 

Substantial work remains to be done in all of these areas. 
Formulation, evaluation, and application of measures is a 
continuous activity that contributes to and profits from a 
growing understanding of the software development process. 

A -comprehensive system of measurement is a necessary prereq- 
uisite to any effort to evaluate or improve the software 
development process and the available software engineering 
technologies (References 42 and 43) . This document will be 
revised and extended as more is learned about measures. 

Currently, the SEL is making a major effort to identify 
measures of software size and complexity that can be applied 
early in the software life cycle (during requirements and 
design) . The SEL is also attempting to automate the meas- 
urement process throughout the software life cycle. The 
ultimate goal of these activities is to produce a management 
tool that will monitor the progress of a software project 
and compare it with a historical data base of similar proj- 
ects, thus allowing the manager to ask and answer questions 
such as those discussed in this document. Reference 44 ex- 
plains these concepts in more detail. 
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